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Abstract 

The Compressive Sensing (CS) framework aims to ease the burden on analog-to-digital con- 
verters (ADCs) by reducing the sampling rate required to acquire and stably recover sparse 
signals. Practical ADCs not only sample but also quantize each measurement to a finite number 
of bits; moreover, there is an inverse relationship between the achievable sampling rate and the 
bit-depth. In this paper, we investigate an alternative CS approach that shifts the emphasis 
from the sampling rate to the number of bits per measurement. In particular, we explore the 
extreme case of 1-bit CS measurements, which capture just their sign. Our results come in two 
flavors. First, we consider ideal reconstruction from noiseless 1-bit measurements and provide a 
lower bound on the best achievable reconstruction error. We also demonstrate that i.i.d. random 
Gaussian matrices provide measurement mappings that, with overwhelming probability, achieve 
nearly optimal error decay. Next, we consider reconstruction robustness to measurement errors 
and noise and introduce the Binary e-Stable Embedding (BeSE) property, which characterizes 
the robustness of the measurement process to sign changes. We show that the same class of 
matrices that provide almost optimal noiseless performance also enable such a robust mapping. 
On the practical side, we introduce the Binary Iterative Hard Thresholding (BIHT) algorithm 
for signal reconstruction from 1-bit measurements that offers state-of-the-art performance. 

1 Introduction 

Recent advances in signal acquisition theory have led to significant interest in alternative sampling 
methods. Specifically, conventional sampling systems rely on the Shannon sampling theorem that 
states that signals must be sampled uniformly at the Nyquist rate, i.e., a rate twice their bandwidth. 
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However, the compressive sensing (CS) framework describes how to reconstruct a signal x G M. N 
from the linear measurements 

y = $x, (1) 

where $ E M MxAr with M < N is an underdetermined measurement system [Tj [2] . It is possible 
to design a physical sampling system <!> such that y = Qx = where a; is a vector of 

Nyquist-rate samples of a bandlimited signal x(t), f £ R. In this case, translates to low, sub- 
Nyquist sampling rates, providing the framework's axial significance: CS enables the acquisition and 
accurate reconstruction of signals that were previously out of reach, limited by hardware sampling 
rates [3] or number of sensors [I]. 

Although inversion of ([!]) seems ill-posed, it has been demonstrated that -KT-sparse signals, i.e., 
x G Sjf where T>k := {x G M. N : \\x\\q := |supp(a;)| < K}, can be reconstructed exactly [Tj[2]. To 
do this, we could naively solve for the sparsest signal that satisfies ([!]), 

x* = argmin ||ii||o s.t. y = <&u; (Res) 
ueR N 

however, this non-convex program exhibits combinatorial complexity in the size of the problem [5]. 



Instead, we solve Basis Pursuit (BP) by relaxing the objective in (Res) to the £i-norm; the result 



is a convex, polynomial-time algorithm [6j. A key realization is that, under certain conditions on 



$, the BP solution will be equivalent to that of (Res) This basic reconstruction framework 
has been expanded to include numerous fast algorithms as well as provably robust algorithms 
for reconstruction from noisy measurements [TMllj. Reconstruction can also be performed with 
iterative and greedy methods |12H14j . 

Reconstruction guarantees for BP and other algorithms are often demonstrated for $ that are 
endowed with the restricted isometry property (RIP), the sufficient condition that the norm of the 
measurements is close to the norm of the signal for all sparse x can be expressed, in 

general terms, as a 5-stable embedding. Let 5 G (0, 1) and X,S C M N . We say the mapping $ is a 
5-stable embedding of X,S if 



x 



s\\l < || $z - $s||| < (1 + S)\\x - s Hi, (2) 



for all x G X and s G S. The RIP requires that Q hold for all x, s G S^; that is, it is a stable 
embedding of sparse vectors. A key result in the CS literature is that, if the coefficients of $ are 
randomly drawn from a sub-Gaussian distribution, then $ will satisfy the RIP with high probability 
as long as M > CsKlog(N/K), for some constant Cs |16| I17|. Several hardware inspired designs 
with only a few randomized components have also been shown to satisfy this property [T5H2U] . 

In practice, CS measurements must be quantized, i.e., each measurement is mapped from a real 
value (over a potentially infinite range) to a discrete value over some finite range. For example, in 
uniform quantization, a measurement is mapped to one of 2 B distinct values, where B denotes the 
number of bits per measurement. Quantization is an irreversible process that introduces error in the 
measurements. One way to account for quantization error is to treat it as bounded noise and employ 
robust reconstruction algorithms. Alternatively, we might try to reduce the error by choosing the 
most efficient quantizer for the distribution of the measurements. Several reconstruction techniques 
that specifically address CS quantization have also been proposed [2TH26] . 



1 The RIP is in fact not needed to demonstrate exact reconstruction guarantees in noiseless settings, however it 
proves quite useful for establishing robust reconstruction guarantees in noise. 
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While quantization error is a minor inconvenience, fine quantization invokes a more burdensome, 
yet often overlooked source of adversity: in hardware systems, it is the primary bottleneck limiting 
sample rates \27\ I28| . In other words, the analog-to-digital converter (ADC) is beholden to the 
quantizer. First, quantization significantly limits the maximum speed of the ADC, forcing an 
exponential decrease in sampling rate as the number of bits is increased linearly [28J. Second, the 
quantizer is the primary power consumer in an ADC. Thus, more bits per measurement directly 
translates to slower sampling rates and increased ADC costs. Third, fine quantization is more 
susceptible to non-linear distortion in the ADC electronics, requiring explicit treatment in the 
reconstruction |29j . As we have seen, the CS framework provides one mechanism to alleviate 
the quantization bottleneck by reducing the ADC sampling rate. Is it possible to extend the CS 
framework to mitigate this problem directly in the quantization domain by reducing the number 
of bits per measurement (bit-depth) instead? 

In this paper we concretely answer this question in the affirmative. We consider an extreme 
quantization; just one bit per CS measurement, representing its sign. The quantizer is thus re- 
duced to a simple comparator that tests for values above or below zero, enabling extremely simple, 
efficient, and fast quantization. A 1-bit quantizer is also more robust to a number of commonly 
encountered non-linear distortions in the input electronics, as long as they preserve the signs of the 
measurements. 

It is not obvious that the signs of the CS measurements retain enough information for signal 
reconstruction; for example, it is immediately clear that the scale (absolute amplitude) of the signal 
is lost. Nonetheless, there is strong empirical evidence that signal reconstruction is possible |29H32j . 
In this paper we develop strong theoretical reconstruction and robustness guarantees, in the same 
spirit as classical guarantees provided in CS by the RIP. 

We briefly describe the 1-bit CS framework proposed in [30] . Measurements of a signal x £ M. N 
are computed via 

y = A(x) := sign (3>£c), (3) 

where the sign operator is applied component wise on $x, where sign A equals 1 if A > and — 1 
otherwise, for any A G K. Thus, the measurement operator A(-) is a mapping from M> to the 
Boolean cubd^]S M := { — 1,1} M . At best, we hope to recover signals x G := {x G S^ -1 : 
1 1 ae 1 1 o < K} where S 1 ^" 1 := {x G 1^ : \\x\\2 = 1} is the unit hyper-sphere of dimension N. We 
restrict our attention to sparse signals on the unit sphere since, as previously mentioned, the scale 
of the signal has been lost during the quantization process. To reconstruct, we enforce consistency 
on the signs of the estimate's measurements, i.e., that A{x*) = A(x). Specifically, we define a 
general non-linear reconstruction decoder A lblt (y, K) such that, for x* = A lblt (y, <I>, K), the 
solution x* is 

(i) sparse, i.e., satisfies ||a:*||o < K = \\x\\q, 

(ii) consistent, i.e., satisfies A{x*) = y = A(x). 



With pfcj| ) from CS as a guide, one candidate program for reconstruction that respects these two 



conditions is 

x* = argmin ||tt|| s.t. y = sign (Ribcs) 
lies 1 *- 1 



2 Generally, the M-dimensional Boolean cube is defined as {0, Without loss of generality, we use { — 1, 1} M 
instead. 
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Although the parameter K is not explicit in (Ribcs ); the solution will be -RT'-sparse with K' < K 
because a; is a feasible point of the constraints. 

Since (Ribcs) is computationally intractable, [30] proposes a relaxation that replaces the ob- 
jective with the ^i-norm and enforces consistency via a linear convex constraint. However, the 
resulting program remains non-convex due to the unit-sphere requirement. Be that as it may, sev- 
eral optimization algorithms have been developed for the relaxation, as well as a greedy algorithm 
inspired by the same ideas |30H32j. While previous empirical results from these algorithms provide 
motivation for the validity of this 1-bit framework, there have been few analytical guarantees to 
date. 

The primary contribution of this paper is a rigorous analysis of the 1-bit CS framework. Specif- 
ically, we examine how the reconstruction error behaves as we increase the number of measurement 
bits M given the signal dimension N and sparsity K. We provide two flavors of results. First, 
we determine a lower bound on reconstruction performance from all possible mappings A with the 
reconstruction decoder A lblt , i.e., the best achievable performance of this 1-bit CS framework. We 
further demonstrate that if the elements of $ are drawn randomly from Gaussian distribution or 
its rows are drawn uniformly from the unit sphere, then the worst-case reconstruction error using 
A lblt will decay at a rate almost optimal with the number of measurements, up to a log factor in 
the oversampling rate M/K and the signal dimension N . Second, we provide conditions on A that 
enable us to characterize the reconstruction performance even when some of the measurement signs 
have changed (e.g., due to noise in the measurements). In other words, we derive the conditions 
under which robust reconstruction from 1-bit measurements can be achieved. We do so by demon- 
strating that A is a stable embedding of sparse signals, similar to the RIP. We apply these stable 
embedding results to the cases where we have noisy measurements and signals that are not strictly 
sparse. Our guarantees demonstrate that the 1-bit CS framework is on sound footing and provide 
a first step toward analysis of the relaxed 1-bit techniques used in practice. 

To develop robust reconstruction guarantees, we propose a new tool, the binary e-stable em- 
bedding (BeSE), to characterize 1-bit CS systems. The BeSE implies that the normalized angle 
between any sparse vectors in S N ~ 1 is close to the normalized Hamming distance between their 
1-bit measurements. We demonstrate that the same class of random A as above exhibit this prop- 
erty when M > C e K logiV (where C e is some constant). Thus remarkably, there exist A such that 
the BeSE holds when both the number of measurements M is smaller than the dimension of the 
signal N and the measurement bit-depth is at minimum. 

As a complement to our theoretical analysis, we introduce a new 1-bit CS reconstruction algo- 
rithm, Binary Iterative Hard Thresholding (BIHT). Via simulations, we demonstrate that BIHT 
yields a significant improvement in both reconstruction error as well as consistency, as compared 
with previous algorithms. To gain intuition about the behavior of BIHT, we explore the way that 
this algorithm enforces consistency and compare and contrast it with previous approaches. Perhaps 
more important than the algorithm itself is the discovery that the BIHT consistency formulation 
provides a significantly better feasible solution in noiseless settings, as compared with previous 
algorithms. Finally, we provide a brief explanation regarding why this new formulation achieves 
better solutions, and its connection with results in the machine learning literature. 

Since the first appearance of this work, Plan and Vershynin have developed additional theo- 
retical results and bounds on the performance of 1-bit CS, as well as two convex algorithms with 
theoretical guarantees [33H35] . The results in [331 [33] generalize the BeSE guarantees for more gen- 
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eral classes of signals, including compressible signals in addition to simply sparse ones. However, 
the guarantees provided in that work exhibit worse decay rates in the error performance and the 
tightness of the BeSE property. Furthermore, the results of |341 [35] are intimately tied to recon- 
struction algorithms, in contrast to our analysis. We point out similarities and differences with our 
results when appropriate in the subsequent development. 

In addition to benchmarking the performance of BIHT, our simulations demonstrate that many 
of the theoretical predictions that arise from our analysis (such as the error rate as a function of the 
number of measurements or the error rate as a function of measurement Hamming distance), are 
actually exhibited in practice. This suggests that our theoretical analysis is accurately explaining 
the true behavior of the framework. 

The remainder of this paper is organized as follows. In Section [2j we develop performance 
results for 1-bit CS in the noiseless setting. Specifically we develop a lower bound on reconstruction 
performance as well as provide the guarantee that Gaussian matrices enable this performance. In 
Section [3] we introduce the notion of a BeSE for the mapping A and demonstrate that Gaussian 
matrices facilitate this property. We also expand reconstruction guarantees for measurements 
with Gaussian noise (prior to quantization) and non-sparse signals. To make use of these results in 
practice, in Section[4]we present the BIHT algorithm for practical 1-bit reconstruction. In Section[5] 
we provide simulations of BIHT to verify our claims. In Section [6] we conclude with a discussion 
about implications and future extensions. To facilitate the flow of the paper and clear descriptions 
of the results, most of our proofs are provided in the appendices. 



2 Noiseless Reconstruction Performance 
2.1 Reconstruction performance lower bounds 

In this section, we seek to provide guarantees on the reconstruction error from 1-bit CS measure- 
ments. Before analyzing this performance from a specific mapping A with the consistent sparse 
reconstruction decoder A lblt (y, <E>, K), it is instructive to determine the best achievable performance 
from measurements acquired using any mapping. Thus, in this section we seek a lower bound on 
the reconstruction error. 

We develop the lower bound on the reconstruction error based on how well the quantizer exploits 
the available measurement bits. A distinction we make in this section is that of measurement bits, 
which is the number of bits acquired by the measurement system, versus information bits, which 
represent the true amount of information carried in the measurement bits. Our analysis follows 
similar ideas to that in [361 EI], adapted to sign measurements. 

We first examine how 1-bit quantization operates on the measurements. Specifically, we consider 
the orthants of the measurement space. An orthant in R M is the set of vectors such that all the 
vector's coefficients have the same sign pattern 

Oz := {z € M M | signz = z}, (4) 

where z £ B M . Notice that U M€B mO z = R M and G 2 n = if z ^ z' . Therefore, any 
M-dimensional space is partitioned to 2 M orthants. Figure 1(a) shows the 8 orthants of M 3 as 



an example. Since 1-bit quantization only preserves the signs of the measurements, it encodes in 
which measurement space orthant the measurements lie. Thus, each available quantization point 
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(a) (b) 

Figure 1: (a) The 8 orthants in R 3 . (b) Intersection of orthants by a 2-dimensional subspace. At most 6 of the 8 
available orthants are intersected. 

corresponds to an orthant in the measurement space. Any unquantized measurement vector &x 
that lies in an orthant of the measurement space will quantize to the corresponding quantization 
point of that orthant and cannot be distinguished from any other measurement vector in the same 
orthant. To obtain a lower bound on the reconstruction error, we begin by bounding the number 
of quantization points (or equivalently the number of orthants) that are used to encode the signal. 

While there are generally 2 M orthants in the measurement space, the space formed by measuring 
all sparse signals occupies a small subset of the available orthants. We determine the number of 
available orthants that can be intersected by the measurements in the following lemma: 

Lemma 1. Let x 6 S := \Ji = i <?i belong to a union of L subspaces Si C of dimension K, and 
let M > 2K 1-bit measurements y be acquired via the mapping A : — > B M as defined in Q). 
Then the measurements y can effectively use at most 2 K L(^) quantization points, i.e., carry at 
most K\og 2 {2Me/K) +log 2 (L) information bits. 

Proof. A A'-dimensional subspace in an M-dimensional space cannot lie in all the 2 M available 



octants. For example, as shown in Fig. 1(b), a 2-dimensional subspace of a 3-dimensional space 



can intersect at most 6 of the available octants. In Appendix \A\ we demonstrate that one arbitrary 



if-dimensional subspace in an M-dimensional space intersects at most 2 K orthants of the 2 M 



available. Since $ is a linear operator, any if-dimensional subspace <S$ in the signal space M. N is 
mapped through $ to a subspace S[ = QSi C M M that is also at most If -dimensional and therefore 
follows the same bound. Thus, if the signal of interest belongs in a union S := 1J^ =1 5j of L such 
if -dimensional subspaces, then £ S' := 1J^ 1 5^, and it follows that at most 2 K L{^) orthants 
are intersected. This means that at most 2 K L^) < L{^^-) K effective quantization points can be 
used, i.e., at most K\og 2 (2eM/K) + \og 2 {L) information bits can be obtained. □ 



Since -ftT-sparse signals in any basis ^ £ W belong to a union of at most 
m R N with < (eN/K) K , using Lemma [l] we can obtain the following corollaryj^] 

3 This corollary is easily adaptable to a redundant frame $ £ ~M. NxD with D > N. 



\K) 



subspaces 
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Corollary 1. Let x = $>a £R N be K -sparse in a certain basis E M. NxN , i.e., a E Then 
the measurements y = A{x) can effectively use at most 2 K (^)(^) 1-bit quantization points, i.e, 
carry at most 2K \og 2 {^/2 e^J N M / K) information bits. 

The set of signals of interest to be encoded is the set of unit-norm fC-sparse signals E^-. Since 
unit-norm signals of a fC-dimensional subspace form a fC-dimensional unit sphere in that subspace, 
T,* K is a union of (^) such unit spheres. The Q := 2 K (^-) (^5) available quantization points partition 
T** K into Q smaller sets, each of which contains all the signals that quantize to the same point. 

To develop the lower bound on the reconstruction error we examine how Y? K can be optimally 
partitioned with respect to the worst-case error, given the number of quantization points used. The 
measurement and reconstruction process maps each signal in T,* K to a finite set of quantized signals 
Q C S^, \Q\ = Q. At best this map ensures that the worst case reconstruction error is minimized, 
i.e., 

e op t = max min \\x — qlk, (5) 

where e op t denotes the worst-case quantization error and q each of the available quantization points. 
The optimal lower bound is achieved by designing Q to minimize ^ without considering whether 
the measurement and reconstruction process actually achieve this design. Thus, designing the set 
Q becomes a set covering problem. Appendix [B] precises this intuition and proves the following 
statement. 

Theorem 1. Let the mapping A : M. N — > B M and measurements y be defined as in ^ and let 
x E Yi* K . Then the estimate from the reconstruction decoder A lhlt (y, <£, K) has error defined by |5j] 
of at least 

K „(K 



6opt ~ 2eM + 2K3/2 M Q \M 

Thus, when M is high compared to K 3 ^ 2 , the worst-case error cannot decay at a rate faster 
than 0(1/M) as a function of the number measurements, no matter what reconstruction algorithm 
is used. 

This result assumes noiseless acquisition and provides no guarantees of robustness and noise 
resiliency. This is in line with existing results on scalar quantization in oversampled representations 
and CS that state that the distortion due to scalar quantization of noiseless measurements cannot 
decrease faster than the inverse of the measurement rate |36H40j . 

To improve the rate vs. distortion trade-off, alternative quantization methods must be used, such 
as Sigma-Delta (SA) quantization |41H47j or non-monotonic scalar quantization |48j . Specifically, 
SA approaches to CS can achieve error decay rate of 0{{K/M)P- 1 / 2 ), where p is the order of the 
quantizer [17] . However, SA quantization requires feedback during the quantization process, which 
is not necessary in scalar quantization. Furthermore, the result in |47| only holds for multibit 
quantizers, not 1-bit ones. While efficient 1-bit SA quantization has been shown for classical 
sampling |42^ I49j. to the best of our knowledge, similar results are not currently known for 1-bit 
SA in CS applications. Alternatively, non-monotonic scalar quantization can achieve error decay 
exponential in the number of measurements M, even in CS applications |48j . However, such a 
scheme requires a significantly more complex scalar quantizer and reconstruction approach |50j. 
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Theorem [T] bounds the best possible performance of a consistent reconstruction over all possible 
mappings A. However, not all mappings A will behave as the lower bound suggests. In the next 
section we identify two classes of matrices such that the mapping A admits an upper bound on the 
reconstruction error from a general decoder A lblt that decays almost optimally. 



2.2 Achievable performance via random projections 

In this section we describe a class of matrices $ such that the consistent sparse reconstruction de- 
coder A lhit (y,$,K) can indeed achieve error decay rates of optimal order, described by TheoremjlJ 
with the number of measurements growing linearly in the sparsity K and logarithmically in the 
dimension N, as is required in conventional CS. We first focus our analysis on Gaussian matrices, 
i.e., such that each element <j>i j is randomly drawn i.i.d. from the standard Gaussian distribution, 
J\f(0, 1). In the rest of the paper, we use the short notation $ ~ Af MxN (0, 1) to characterize such 
matrices, and we write <p ~ J\f Nxl (0, 1) to describe the equivalent random vectors in l w (e.g., the 
rows of <£). For these matrices we prove the following in Appendix [Cj 

Theorem 2. Let <E> be matrix generated as $ ~ Af MxN (0, 1), and let the mapping A : ~R N — > B 
be defined as in |3p. Fix < n < 1 and e a > 0. // the number of measurements is 

M> ^(2^1og(A0 + 4Klog(^) + logi), (6) 
then for all x,s G Y? K we have that 

||aj-a|| 2 > e c A(x) A(s), (7) 

or equivalently 

A(x) = A{s) =^ 1 1 as — s\\2 < e Q , 
with probability higher than 1 — rj. 

Theorem [2] is a uniform reconstruction result, meaning that with high probability all vectors 
x, s G T,* K can be reconstructed as opposed to a non- uniform result where each vector could be 
reconstructed with high probability. 

As derived in Appendix [Gj Theorem [2] demonstrates that if we use Gaussian matrices in the 
mapping A, then, given a fixed probability level rj, the reconstruction decoder A lhlt (y, $, K) will 
recover signals with error order 

to = O(flog^), 

which decays almost optimally compared to the lower bound given in Theorem [T] up to a log factor 
in MN/K. Whether the gap can be closed, with tighter lower or upper bounds is still an open 
question. Notice that the hidden proportionality factor in this last relation depends linearly on 
log 1/rj which is assumed fixed. 

We should also note a few minor extensions of Theorem [2] We can multiply the rows of $ with 
a positive scalar without changing the signs of the measurements. By normalizing the rows of the 
Gaussian matrix we obtain another class of matrices, ones with rows drawn uniformly from the 
unit £2 sphere in M N . It is thus straightforward to extend the Theorem to matrices with such rows 
as well. Furthermore, these projections are rotation invariant (often referred to as "universal" in 
CS systems), meaning that the theorem remains valid for sparse signals in any basis i.e., for 
x, s belonging to K := {u = G M. N : a £ S^}. This is true since for any orthonormal basis 
^ G R NxN , = ^ ~ JV MxN (0, 1) when $ ~ M MxN (0, 1). 
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2.3 Related Work 



A similar result to Theorem [2] has been recently shown for sign measurements of non-sparse signals 
in the context of quantization using frame permutations [51] . Specifically, it has been shown that 
reconstruction from sign measurements of signals can be achieved (almost surely) with an error rate 
decay arbitrarily close to 0(1/M). Our main contribution here is demonstrating that this result is 
true uniformly for all isT-sparse vectors in M N , given a sparse and consistent decoder. Our results, 
in addition to introducing the almost linear dependence on K, also show that proving this error 
bound uniformly for all -fC-sparse signals involves a logarithmic penalty in {MN)/K. This does not 
seem to be necessary from the lower bound in the previous section. We will see in Section [5] that for 
Gaussian matrices, the optimal error behavior is empirically exhibited on average. Finally, we note 
that for a constant eo, the number of measurements required to guarantee ([7]) is M = 0(K log N), 
nearly the same as order in conventional CS. 

Furthermore, since the first appearance of our work, a bound on the achievable reconstruction 
error for compressible signals and for signals in arbitrary subsets of M. N appeared in [33, 34j. 
Specifically for compressible signals, that works leads to error decay e = 0(( || log ^) 1//4 ), which 
decreases more slowly (with K/M) than both our bound and the one provided in [51] . However, 
the results in [33, El] are for more general classes of signals. 

We can also view the binary measurements as a hash or a sketch of the signal. With this 
interpretation of the result we guarantee with high probability that no sparse vectors with Euclidean 
distance greater than e will "hash" to the same binary measurements. In fact, similar results play a 
key role in locality sensitive hashing (LSH), a technique that aims to efficiently perform approximate 
nearest neighbors searches from quantized projections [52-55J. Most LSH results examine the 
performance on point-clouds of a discrete number of signals instead of the infinite subspaces that 
we explore in this paper. Furthermore, the primary goal of the LSH is to preserve the structure 
of the nearest neighbors with high probability. Instead, in this paper we are concerned with the 
ability to reconstruct the signal from the hash, as well as the robustness of this reconstruction to 
measurement noise and signal model mismatch. To enable these properties, we require a property 
of the mapping A that preserves the structure (geometry) of the entire signal set. Thus, in the next 
section we seek an embedding property of A that preserves geometry for the set of sparse signals 
and thus ensures robust reconstruction. 



3 Acquisition and Reconstruction Robustness 
3.1 Binary e-stable embeddings 

In this section we establish an embedding property for the 1-bit CS mapping A that ensures 
that the sparse signal geometry is preserved in the measurements, analogous to the RIP for real- 
valued measurements. This robustness property enables us to upper bound the reconstruction 
performance even when some measurement signs have been changed due to noise. Conventional CS 
achieves robustness via the <5-stable embeddings of sparse vectors ^ discussed in Section [lj This 
embedding is a restricted quasi-isometry between the metric spaces (R N ,d x ) and (R M ,d Y ), where 
the distance metrics dx and dy are the in dimensions N and M, respectively, and the 
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domain is restricted to sparse signals]^] We seek a similar definition for our embedding; however, 
now the signals and measurements lie in the different spaces S N_1 and B M , respectively. Thus, we 
first consider appropriate distance metrics in these spaces. 

The Hamming distance is the natural distance for counting the number of unequal bits between 
two measurement vectors. Specifically, for d, b £ B M we define the normalized Hamming distance 
as 

M 

d H (a,b) = jj^aiSibi, 

i=l 

where a© b is the XOR operation between a,b E B such that a© b equals if a = b and 1 otherwise. 
The distance is normalized such that du £ [0, 1]. In the signal space we only consider unit-norm 
vectors, thus, a natural distance is the angle formed by any two of these vectors. Specifically, for 
x, s G S ^, we consider 

ds(x,s) := - arccos(:c, s). 

As with the Hamming distance, we normalize the true angle arccos(a;, s) such that d$ € [0, 1]. Note 
that since both vectors have the same norm, the inner product (x, s) can easily be mapped to the 
^2-distance using the polarization identity. 

Using these distance metrics we define the binary stable embedding. 

Definition 1 (Binary e-Stable Embedding). Let e £ (0,1). A mapping A : ~R N — > B M is a binary 
e-stable embedding (BeSE) of order K for sparse vectors if 

d s (x, s) - e < d H {A(x), A(s)) < d s (x, s) + e 

for all x, s G S 1 ^^ 1 with | supp (a;) U supp (s) \ < K. 

Our definition describes a specific quasi-isometry between the two metric spaces (S N ~ 1 ,ds) 
and (B M ,dn), restricted to sparse vectors. While this mirrors the form of the <5-stable embedding 
for sparse vectors, one important difference is that the sensitivity term e is additive, rather than 
multiplicative, and thus the BeSE is not bi-Lipschitz. This is a necessary side-effect of the loss of 
information due to quantization. 

Any BeSE A(-) of order 2K enables robustness guarantees on any reconstruction algorithm 
extracting a unit sparse signal estimate x* of x G T,* K . In this case, the angular error is immediately 
bounded by 

d s (x,x*) <d H (A(x),A(x*))+e. 

Thus, if an algorithm returns a unit norm sparse solution with measurements that are not consis- 
tent (i.e., dff(A(x), A(x*)) > 0), as is the case with several algorithms [30H32] . then the worst-case 
angular reconstruction error is close to Hamming distance between the estimate's measurements' 
signs and the original measurements' signs. Section [5] verifies this behavior with simulation re- 



sults. Furthermore, in Section 3.3 we use the BeSE property to guarantee that if measurements are 



corrupted by noise or if signals are not exactly sparse, then the reconstruction error is bounded. 



4 A function A : X — > Y is called a quasi-isometry between metric spaces (X, dx) and (Y,dy) if there exists 
C > and D > such that ^d x {x, s) - D < d Y (A(x) , A(s)) < Cd x {x, s) + D for x, s e X, and E > such that 
dy{y, A(x)) < E for all y £ Y [56] . Since D — for 5-stable embeddings, they are also called bi-Lipschitz mappings. 
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Note that, in the best case, for a BeSE A(-), the angular error of any sparse and consistent 
A lblt (y, K) decoder is bounded by e since thend H (A(x),A(x*)) =0. As we have seen earlier this 
is to be expected because, unlike conventional noiseless CS, quantization fundamentally introduces 
uncertainty and exact recovery cannot be guaranteed. This is an obvious consequence of the 
mapping of the infinite set T,* K to a discrete set of quantized values. 

We next identify a class of matrices $ for which A is a BeSE. 



3.2 Binary e-stable embeddings via random projections 

As is the case for conventional CS systems with RIP, designing a $ for 1-bit CS such that A has the 
BeSE property is possibly a computationally intractable task (and no such algorithm is yet known). 
Fortunately, an overwhelming number of "good" matrices do exist. Specifically we again focus our 



analysis on Gaussian matrices $ ~ Af MxN (0, 1) as in as in Section 2.2 As motivation that this 
choice of <3? will indeed enable robustness, we begin with a classical concentration of measure result 
for binary measurements from a Gaussian matrix. 

Lemma 2. Let x, s 6 S^ -1 be a pair of arbitrary fixed vectors, draw $ according to $ ~ 
M MxN {0, 1), and let the mapping A : R N -)■ B M be defined as in ||). Fix e > 0. Then we 
have 

¥(\d H (A(x),A(s)) - d s (x,s)\ < e) > l-2e~^ M , (8) 
where the probability is with respect to the generation o/$. 

Proof. This lemma is a simple consequence of Lemma 3.2 in [57] which shows that, for one mea- 
surement, P[A,(a:) 7^ -Aj(s)] = ds(x, s). The result then follows by applying Hoeffding's inequality 
to the binomial random variable Mdn^Afa), A(s)^ with M trials. □ 

In words, Lemma [2] implies that the Hamming distance between two binary measurement vectors 
A(x),A(s) tends to the angle between the signals x and s as the number of measurements M 
increases. In |57| this fact is used in the context of randomized rounding for max-cut problems; 
however, this property has also been used in similar contexts as ours with regards to preservation 
of inner products from binary measurements [58l |59| . 

The expression ^ indeed looks similar to the definition of the BeSE, however, it only holds 
for a fixed pair of arbitrary (not necessarily sparse) signals, chosen prior to drawing <I>. Our goal 
is to extend ^ to cover the entire set of sparse signals. Indeed, concentration results similar to 
Lemma[2j although expressed in terms of norms, have been used to demonstrate the RIP [16] . These 
techniques usually demonstrate that the cardinality of the space of all sparse signals is sufficiently 
small, such that the concentration result can be applied to demonstrate that distances are preserved 
with relatively few measurements. 

Unfortunately, due to the non-linearity of A we cannot immediately apply Lemma [2] using the 
same procedure as in [IB] . To briefly summarize, pDB] proceeds by covering the set of all K-sparse 
signals with a finite set of points (with covering radius 5 > 0). A concentration inequality is 
then applied to this set of points. Since any sparse signal lies in a 5-neighborhood of at least one 
such point, the concentration property can be extended from the finite set to by bounding the 
distance between the measurements of the points within the 5- neighbor hood. Such an approach 
cannot be used to extend ((8|) to S^-, because the severe discontinuity of our mapping does not 
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permit us to characterize the measurements A{x + s) using A(x) and A(s) and obtain a bound on 
the distance between measurements of signals in a 5-neighborhood. 

To resolve this issue, we extend Lemma [2] to include all points within Euclidean balls around the 
vectors x and s inside the (sub) sphere £*(T) := {u G S 1 ^ 1 : supptt C T} for some fixed support 
set T C [N] := {1, • • • , N} of size \T\ = D. Define the fl-ball B s [x) := {a G S^ 1 : \\x - a\\ 2 < 6} 
to be the ball of Euclidean distance 5 around x, and let B$(x) := Bg(x) D E*(T). 

Lemma 3. Given T C [iV] of size \T\ = L>, Zei <J> 6e a matrix generated as $ ~ Af MxN (0, 1), and 
Ze£ t/ze mapping A : — >■ ,6 M 6e defined as in Q). Fix e > and < <5 < 1. For any x, s 6 £*(T), 
we /iai>e 



VtiGBKajJ.VreBJCa), d H (A(«),>l(«)) - 



< e + 



> 1-2 e 



-2e 2 M 



The proof of this result is given in Appendix [D] It should be noted that the proof does not depend 
on the radial behavior of the Gaussian pdf in M. N . In other words, this result is easily generalizable 
to matrices $ whose rows are independent random vectors drawn from an isotropic pdf . 

In words, if the width 5 is sufficiently small, then the Hamming distance between the 1-bit 
measurements A(u), A(v) of any points u, v within the balls B$(x), B$(s), respectively, will be 
close to the angle between the centers of the balls. 

Lemma [3] is key for providing a similar argument to that in [16] . We now simply need to count 
the number of pairs of X-sparse signals that are euclidean distance 5 apart. The Lemma can 
then be invoked to demonstrate that the angles between all of these pairs will be approximately 
preserved by our mappingj^] Thus, with Lemma [3] under our belt, we demonstrate in Appendix^ 
the following result. 



Theorem 3. Let $ be a matrix generated as <I> ~ M MxN (0, 1) and let the mapping A : M. N 
be defined as in Fix < n < 1 and e > 0. // the number of measurements is 



M 



M > \(K log(iV) + 2K log( 



35 > 



log( 



B 



(9) 



then with probability exceeding 1 — rj, the mapping A is a BeSE of order K for sparse vectors. 



As with Lemma [3J the theorem extends easily to matrices $ with independent rows in M. N 
drawn from an isotropic pdf in this space. 

By choosing $ ~ Af MxN (0, 1) with M = 0(K log N), with high probability we ensure that the 
mapping A is a BeSE. Additionally, using ^ with a fixed n and the development in Appendix [Gj 
we find that the error decreases as 




Unfortunately, this decay rate is slower, roughly by a factor of ^K/M, than the lower bound 
in Section [2.1| This error rate results from an application of the Chernoff-Hoeffding inequality in 
the proof of Theorem |3j An open question is whether it is possible to obtain a tighter bound (with 
optimal error rate) for this robustness property. 

5 We note that the covering argument in the proof of Theorem [2] also employs 5-balls in similar fashion but only 
considers the probability that du = 0, rather than the concentration inequality. 
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As with Theorem [2j Gaussian matrices provide a universal mapping, i.e., the result remains 
valid for sparse signals in a basis \P E ~R NxN . Moreover, Theorem [3] can also be extended to rows 
of <I> that are drawn uniformly on the sphere, since the rows of $ in Theorem [3] can be normalized 
without affecting the outcome of the proof. Note that normalizing the Gaussian rows of $ is as if 
they had been drawn from a uniform distribution of unit-norm signals. 

We have now established a random construction providing robust BeSEs with high probability: 
1-bit quantized Gaussian projections. We now make use of this robustness by considering an 
example where the measurements are corrupted by Gaussian noise. 



3.3 Noisy measurements and compressible signals 

In practice, hardware systems may be inaccurate when taking measurements; this is often modeled 
by additive noise. The mapping A is robust to noise in an unusual way. After quantization, the 
measurements can only take the values —1 or 1. Thus, we can analyze the reconstruction perfor- 
mance from corrupted measurements by considering how many measurements flip their signs. For 
example, we analyze the specific case of Gaussian noise on the measurements prior to quantization, 
i.e., 

A n (x) := sign (<S>x + n), (10) 

where n E M. M has i.i.d. elements rtj ~ M(0,a 2 ). In this case, we demonstrate, via the following 
lemma, a bound on the Hamming distance between the corrupted and ideal measurements with 
the BeSE from Theorem [3] (see Appendix |F|) . 

Lemma 4. Let & be a matrix generated as $ ~ M MxN (0, 1), let the mapping A : M. N — > B M be 
defined as in Q, and let A n : R N B M be defined as in M. Let n E H M be a Gaussian random 
vector with i.i.d. components ni ~ A/"(0, a 2 ). Fix 7 > 0. Then, given x E 1^, we have 



E[d H {A n (x),A(x))) < e(a, \\x\\ 2 ), 
F(d H {A n (x),A(x))> e(<r,||aj|| 2 )+7) < e~ 2M ^ , 



where e(a, \\x\\ 2 ) := l ^jf^p < 

If x* n is the estimate from a sparse consistent reconstruction decoder A lhlt (A n (x), K) from 
the measurements A n (x) with $ ~ Af MxN (0, 1) and if M satisfies then it immediately follows 
from Lemma H] and Theorem [3] that 

d s (x* n ,x) < d H (A n (x),A(x))+e < + (11) 



with a probability higher than 1 — e~ 2M ^ 2 — n. Given alternative noise distributions, e.g., Poisson 
noise, a similar analysis can be carried out to determine the likely number of sign flips and thus 
provide a bound on the error due to noise. 

Another practical consideration is that real signals are not always strictly X-sparse. Indeed, it 
may be the case that signals are compressible; i.e., they can be closely approximated by a X-sparse 
signal. In this case, we can reuse the non- uniform result of Lemma [2] to see that, given x E ~Sl N 
and for * ~ Af MxN (0, 1), 



f d H (A(x),A(x K )) > d s (x,x K ) + 7) < e 



■2M7 2 
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In similar fashion to ( |11[ ), if M satisfies ([9]), this result and Theorem [3] imply that, given 
x G S^ -1 (not necessarily sparse) and for $ ~ Af MxN (0, 1) the angular reconstruction error of 
x* = A lhit (A(x),<S>,K) is such that d s (x*,x K ) < d H (A(x*), A(x K )) + e = dff(A(a;), A(cek)) + e < 
cLs{x,xk) + 7 + e, with probability higher than 1 — e~ 2M ^' — r]. Therefore, from the triangular 
inequality on d$, this provides the bound 

d s (x*,x) < 2d s {x,x K ) +7 + e, 

with the same probability. Much like conventional CS results, the reconstruction error depends on 
the magnitude of the best if -term approximation error of the signal, here expressed angularly by 
d s (x,x K ). 

This reconstruction error bound is non- uniform with respect to the selection of x € WL N . A 
uniform bound on the BeSE for more general classes of signals is developed in |33l 134"] , albeit with 
a worse error decay — e = 0((^ log ^) 1 ^ 4 ) for compressible signals. 

Thus far we have demonstrated a lower bound on the reconstruction error from 1-bit measure- 
ments (Theorem [2]) and introduced a condition on the mapping A that enables stable reconstruction 
in noiseless, noisy, and compressible settings (Definition [TJ . We have furthermore demonstrated 
that a large class of random matrices — specifically matrices with coefficients drawn from a Gaussian 
distribution and matrices with rows drawn uniformly from the unit sphere — provide good mappings 
(Theorem [3]) . 

Using these results we can characterize the error performance of any algorithm that reconstructs 
a if-sparse signal. If the reconstructed signal quantizes to the same quantization point as the 
original data, then the error is characterized by Theorem [2] If the algorithm terminates unable 
to reconstruct a signal consistent with the quantized data, then Theorem [3] describes how far the 



solution is from the original signal. Since (Ribcs) is a combinatorially complex problem, in the 



next section we describe a new greedy reconstruction algorithm that attempts to find a solution as 
consistent with the measurements as possible, while guaranteeing this solution is -ftT-sparse. 



4 BIHT: A Simple First-Order Reconstruction Algorithm 
4.1 Problem formulation and algorithm definition 

We now introduce a simple algorithm for the reconstruction of sparse signals from 1-bit compressive 
measurements. Our algorithm, Binary Iterative Hard Thresholding (BIHT), is a simple modification 
of IHT, the real- valued algorithm from which is takes its name p3|. Demonstrating theoretical 
convergence guarantees for BIHT is a subject of future work (and thus not shown in this paper), 
however the algorithm is of significant value since it i) has a simple and intuitive formulation 
and ii ) outperforms previous algorithms empirically, demonstrated in Section [5] We further note 
that the IHT algorithm has recently been extended to handle measurement non-linearities [60J; 
however, these results do not apply to quantized measurements since quantization does not satisfy 
the requirements in [60j . 

We briefly recall that the IHT algorithm consists of two steps that can be interpreted as follows. 
The first step can be thought of as a gradient descent to reduce the least squares objective \\y — 
&XW2/2. Thus, at iteration I, IHT proceeds by setting a l+1 = x l + & T (y — <&x). The second step 
imposes a sparse signal model by projecting a l+1 onto the "£q ball", i.e., selecting the K largest in 
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magnitude elements. Thus, IHT for CS can be thought of as trying to solve the problem 



argmin ~\\y — s.t. ||w||o = K. (12) 

u 

The BIHT algorithm simply modifies the first step of IHT to instead minimize a consistency- 
enforcing objective. Specifically, given an initial estimate x° = and 1-bit measurements y, at 
iteration / BIHT computes 

a l+l = x l + ^ T (y-A(x 1 )), (13) 
x l+1 = VK (a l+1 ), (14) 

where A is defined as in ([3]), r is a scalar that controls gradient descent step-size, and the function 
tik{v) computes the best X-term approximation of v by thresholding. Once the algorithm has 
terminated (either consistency is achieved or a maximum number of iterations have been reached) , 
we then normalize the final estimate to project it onto the unit sphere. Section 4.2 discusses several 
variations of this algorithm, each with different properties. 

The key to understanding BIHT lies in the formulation of the objective. The following Lemma 
shows that the term Q T (y — A(x l )^j in (13) is in fact the negative subgradient of a convex objective 
J . Let [•]_ denote the negative function, i.e., ([«]_) j = [uj]_ with \u\\- = u\ if Ui < and else, 
and uQ v denote the Hadamard product, i.e., (u v)i = UiVi for two vectors u and v. 

Lemma 5. The quantity \ \A{x) — y) in (13) is a subgradient of the convex one-sided i\-norm 

J(x) = \\[y®(§x)U\ l , 



Thus, BIHT aims to decrease J at each step (13). 



Proof. We first note that J is convex. We can write J(x) = J2i <Ji( x ) with each convex function 
Ji given by 

'\((Pi,x)\, if Ai(x)yi < 0, 
0, else, 



where <pi denotes a row of <3? and Ai{x) = sign ((fi, x). Moreover, if (cpi, x) ^ 0, then the gradient 
of Ji is 

th<t( - *\ K At \ -\ \Ai{x)ipi i{yiAi(x)<0, 

[0, else 

while if (<fi, x) = 0, then the gradient is replaced by the subdifferential set 

VJi(x; y, $) = {%(Ai(x) -yi)<Pi--££ [0, 1]} 9 \{Ai{x) - ft) ^. 
Thus, by summing over i we conclude that | ^ T (A(x) — yj £ V J(x; y, $). □ 
Consequently, the BIHT algorithm can be thought of as trying to solve the problem: 
x* = argmin r || [y ($«)]_ ||i s.t. [|w||o = K, [|u||2 = 1- 
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Observe that since y (&x) simply scales the elements of <frx by the signs y, minimizing the 
one-sided £\ objective enforces a positivity requirement, 

yQ($x)>0, (15) 

that, when satisfied, implies consistency. 

Previously proposed 1-bit CS algorithms have used a one-sided ^-norm to impose consis- 
tency [29-32J. Specifically, they have applied a constraint or objective that takes the form 
|| [y ($x)]_|||/2. Both the one-sided t\ and £2 functions imply a consistent solution when they 
evaluate to zero, and thus, both approaches are capable of enforcing consistency. However, the 
choice of the t\ vs. £2 penalty term makes a significant difference in performance depending on the 
noise conditions. We explore this difference in the experiments in Section [5] 



4.2 BIHT shifts 



Several modifications can be made to the BIHT algorithm that may improve certain performance 
aspects, such as consistency, reconstruction error, or convergence speed. While a comprehensive 
comparison is beyond the scope of this paper, we believe that such variations exhibit interesting 
and useful properties that should be mentioned. 

Projection onto sphere at each iteration. We can enforce that every intermediate solution 
have unit £2 norm. To do this, we modify the "impose signal model" step (14) by normalizing 
after choosing the best K-teiva approximation, i.e., we replace the update of x l+1 in (14) by 
x l+l = U(r]K(cL lJr1 )), where U(v) = v/\\v\\2- While this step is found in previous algorithms 
such as [30ti32] . empirical observations suggest that it not required for BIHT to converge to an 
appropriate solution. 

If we choose to impose the projection, $ must be appropriately normalized or, equivalently, 
the step size of the gradient descent must be carefully chosen. Otherwise, the algorithm will 
not converge. Empirically, we have found that for a Gaussian matrix, an appropriate scaling is 
l/(\/M||<I>||2), where the 1/||$||2 controls the amplification of the estimate from & T in the gradient 
descent step (13) and the 1/y/M ensures that \\y — A(x l )\\2 < 2. Similar gradient step scaling 
requirements have been imposed in the conventional IHT algorithm and other sparse recovery 
algorithms as well (e.g., P). 

Minimizing hinge loss. The one-sided ^i-norm is related to the hinge-loss function in the 
machine learning literature, which is known for its robustness to outliers [61j. Binary classification 



algorithms seek to enforce the same consistency function as in (15) by minimizing a function 
J2i[ K ~ Vi{^ x )i]+ = III^l — V ( < ^ >:E )]+Ili) where [•]+ sets negative elements to zero. When k > 0, 
the objective is both convex and has a non-trivial solution. Further connections and interpretations 
are discussed in Section [5] Thus, rather than minimizing the one-sided £\ norm, we can instead 
minimize the hinge- loss. The hinge- loss can be interpreted as ensuring that the minimum value 
that an unquantized measurement (3>£c)i can take is bounded away form zero, i.e., |(<fra:)j| > K. 
This requirement is similar to the sphere constraint in that it avoids a trivial solution; however, 
will perform differently than the sphere constraint. In this case, in the gradient descent step (13), 
we instead compute 

a l+1 = x l - r© T (sign(0^ - k) - l)/2 
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where = (y0$) scales the rows of by the signs of y. Again, the step size must be chosen 
appropriately, this time as C K /||$||2, where C R is a parameter that depends on k. 

General one-sided objectives. In general, any function IZ(x) = ^7£j(xj), where IZi is 
continuous and has a negative gradient for Xj < and is for Xi > 0, can be used to enforce 



consistency. To employ such functions, we simply compute the gradient of 1Z and apply it in (13). 
As an example, the previously mentioned one-sided ^-norm has been used to enforce consistency 
in several algorithms. We can use it in BIHT by computing 

a 1 = x l + T$ T [y <&x l ] + 



in (13). We compare and contrast the behavior of the one-sided £\ and £2 norms in Section |5j 

As another example, in similar fashion to the Huber norm [15], we can combine the £\ and £2 
functions in a piecewise fashion. One potentially useful objective is ^7£j(a:), where IZi is defined 
as follows: 

f 0, Xl >0, 
Ki(x) = ( \ Xi \, -\ < Xi < 0, (16) 

[_ "^t 4' %i ^ 2" 

While similar, this is not exactly a one-sided Huber norm. In a one-sided Huber-norm, the square 
(£2) term would be applied to values near zero and the magnitude (£1) term would be applied to 
values significantly less than zero, the reverse of what we propose here. 

This hybrid objective can provide different robustness properties or convergence rates than 
the previously mentioned objectives. Specifically, during each iteration it may allow us to take 
advantage of the shallow gradient of the one-sided £2 cost for large numbers of measurement sign 
discrepancies and the steeper gradient of the one-sided £\ cost when most measurements have the 
correct sign. This objective can be applied in BIHT as with the other objectives, by computing its 



gradient and plugging it into (13). 



5 Experiments 

In this section we explore the performance of the BIHT algorithm and compare it to the perfor- 
mance of previous 1-bit CS algorithms. To make the comparison as straightforward as possible, we 
reproduced the experiments of [32] with the BIHT algorithm. 

The experimental setup is as follows. For each data point, we draw a length- N, X-sparse signal 
with the non-zero entries drawn uniformly at random on the unit sphere, and we draw a new 
M x N matrix $ with each entry (pij ~ A/"(0, 1). We then compute the binary measurements y 
according to ([3]). Reconstruction of x* is performed from y with three algorithms: matching sign 
pursuit (MSP) [31], restricted- step shrinkage (RSS) [32J, and BIHT (this paper); the algorithms 
will be depicted by dashed, dotted, and triangle lines, respectively. Each reconstruction in this 
setup is repeated for 1000 trials and with a fixed iV = 1000 and K = 10 unless otherwise noted. 
Furthermore, we perform the trials for M/N within the range [0, 2]. Note that when M/N > 1, we 
are acquiring more measurements than the ambient dimension of the signal. While the M/N > 1 
regime is not interesting in conventional CS, it may be very practical in 1-bit systems that can 
acquire sign measurements at extremely high, super-Nyquist rates. 

Average error. We begin by measuring the average reconstruction angular error e s i m := 
ds(x, x*) over the 1000 trials. Figure v2\ displays the results of this experiment in two different 
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M/N 

(a) 



M/N 

(b) 



Figure 2: Average reconstruction angular error e s i m vs. M/N, plotted two ways, (a) SNR in decibels, and (b) 
Inverse angular error e~^. The plot demonstrates that BIHT yields a considerable improvement in reconstruction 
error, achieving an SNR as high as 40dB when M/N — 2. Furthermore, we see that the error behaves according 
e sim = O(M), implying that on average we achieve the optimal performance rate given in Theorem IT] 



ways: (i) the signal-to-noise ratio (snrJ| in Figure [2^a), to demonstrate that the performance of 
these techniques is practical (since the angular error is unintuitive to most observers), and (ii) the 
inverse of the angular error, i.e., e^i in Figure 2(b), to compare with the performance predicted 



by Theorem [2] 

We begin by comparing the performance of the algorithms. While we can observe that the 
angular error of each algorithm follows the same trend, BIHT obtains smaller error (or higher SNR) 
than the others, significantly so when M/N is greater than 0.35. The discrepancy in performance 
could be due to difference in the algorithms themselves, or perhaps, differences in their formulations 
for enforcing consistency. This is explored later in this section. 

We now consider the actual performance trend. We see from Figure gb) that, above M/N = 
0.35 each line appears fairly linear, albeit with a different slope, implying that with all other 
variables fixed, e s j m = 0(1/M). This is on the order of the optimal performance as given by the 
bound given in Theorem [T] and predicted by Theorem [2] for Gaussian matrices. 

Consistency. We also expose the relationship between the Hamming distance cIh(A(x), A(x*)) 
between the measurements of the true and reconstructed signal and the angular error of the true and 
reconstructed signal. Figure [3] depicts the Hamming distance vs. angular error for three different 
values of M/N. The particularly striking result is that BIHT returns significantly more consistent 
reconstructions than the two other algorithms. We observed this effect for ratio M/N as small as 



M/N = 0.1 (Figure 3(a) ). This is clear from the fact that most of the red (plus) points lie on the 
y-axis while the majority of blue (dot) or green (triangle) points do not. We find that, even in 
significantly "under-sampled" regimes like M/N = 0.1, where the BeSE is unlikely to hold, BIHT is 
likely to return a consistent solution (albeit with high variance of angular errors). We also find that 
in "over-sampled" regimes such as M/N = 1.7, the range of angular errors on the y-axis is small. 
Indeed, the range of angular errors shrinks as M/N increases, implying an imperical tightening of 
the BeSE upper and lower bounds. 

We can infer an interesting performance trend from Figures |3^b) and (c), where the BeSE 
property may hold. Since the RSS and MSP algorithms often do not return a consistent solution, 
we can visualize the relationship between angular error and hamming error. Specifically, on average 



6 In this paper we define the reconstruction SNR in decibels as SNR(a;) := 10 log 10 ( 1 1 as 1 1 § / 1 1 as — as*!!!). Note that 
this metric uses the standard euclidean error and not angular error. 
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Figure 3: Reconstruction angular error e s im vs. measurement Hamming error en- BIHT returns a consistent solution 
in most trials. For sufficiently large M/N regimes, we see a linear relationship e s i m w C + €h between the average 
angular error e s i m and the hamming error en where C is constant (see (a) and (b)). The BeSE formulation in 
Definition [I] predicts that the angular error is bounded by the hamming error en in addition to an offset e. The 
dashed line eyiooo + £h denotes the empirical upper bound for 1000 trials. 



the angular reconstruction error is a linear function of hamming error, en = djj(A(x), A(x*)), as 
similarly expressed by the reconstruction error bound provided by BeSE. Furthermore, if we let 
eiooo be the largest angular error (with consistent measurements) over 1000 trials, then we can 
suggest an empirical upper bound for BIHT of eiooo + ch- This upper bound is denoted by the 
dashed line in Figures [3^b) and (c). 

One-sided £\ vs. one-sided £2 objectives. As demonstrated in Figures [2] and [3j the BIHT 

algorithm achieves significantly improved performance over MSP and RSS in both angular error 
and Hamming error (consistency). A significant difference between these algorithms and BIHT is 



that MSP and RSS seek to impose consistency via a one-sided ^2-norm, as described in Section 4.2 



Minimizing either the one-sided £\ or one-sided £2 objectives will enforce consistency on the 
measurements of the solution; however, the behavior of these two terms appears to be significantly 
different, according to the previously discussed experiments. 

To test the hypothesis that this term is the key differentiator between the algorithms, we im- 
plemented BIHT-^2 , a one-sided £2 variation of the BIHT algorithm that enabled a fair comparison 
of the one-sided objectives (see Section 4.2 for details). We compared both the angular error and 
Hamming error performance of BIHT and BIHT-^2- Furthermore, we implemented oracle assisted 
variations of these algorithms where the true support of the signal is given a priori, i.e., t]k in (14) 
is replaced by an operator that always selects the true support, and thus the algorithm only needs 
to estimate the correct coefficient values. The oracle assisted case can be thought of as a "best 
performance" bound for these algorithms. Using these algorithms, we perform the same experiment 
detailed at the beginning of the section. 

The results are depicted in Figure [4j The angular error behavior of BIHT-^2 is very similar to 
that of MSP and RSS and underperforms when compared to BIHT. We see the same situation with 
regards to Hamming error: BIHT finds consistent solutions for the majority of trials, but BIHT-^2 
does not. 

Thus, the results of this simulation suggest that the one-sided term plays a significant role in 
the quality of the solution obtained. 
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Figure 4: Enforcing consistency: One-sided t\ vs. one-sided £2 BIHT. When BIHT attempts to minimize a one-sided 
£2 instead of a one-sided £\ objective, the performance significantly decreases. We find this to be the case even when 
an oracle provides the true signal support a priori. Note: (c) is simply a zoomed version (b). 



One way to explain the performance discrepancy between the two objectives comes from ob- 
serving the connection between our reconstruction problem and binary classification. As explained 
previously, in the classification context, the one-sided i\ objective is similar to the hinge-loss, and 
furthermore, the one-sided £2 objective is similar to the so-called square-loss. Previous results in 
machine learning have shown that for typical convex loss functions, the minimizer of the hinge 
loss has the tightest bound between expected risk and the Bayes optimal solution [62] and good 
error rates, especially when considering robustness to outliers [62, 63J. Thus, the hinge loss is often 
considered superior to the square loss for binary classification^] One might suspect that since the 
one-sided ^i-objective is very similar to the hinge loss, it too should outperform other objectives in 
our context. Understanding why in our context, the geometry of the l\ and £2 objectives results in 
different performance is an interesting open problem. 

We probed the one-sided £\j£2 objectives further by testing the two versions of BIHT on noisy 
measurements. We flipped a number of measurement signs at random in each trial. For this 
experiment, N = M = 1000 and K = 10 are fixed, and we performed 100 trials. We varied the 
number of sign flips between 0% and 5% of the measurements. The results of the experiment are 
depicted in Figure [5] We see that for both the angular error in Figure [5^a) and Hamming error 
in Figure [5]^b), that the one-sided £\ objective performs better when there are only a few errors 
and the one-sided £2 objective performs better when there are significantly more errors. This is 
expected since the £\ objective promotes sparse errors. This experiment implies that BIHT-^2 
(and the other one-sided ^2-based algorithms) may be more useful when the measurements contain 

7 Additional "well-behaved" loss functions (e.g., the Huber-ized hinge loss) have been proposed [M] and a host 
of classification algorithms related to this problem exist |63H67] . both of which may prove useful in the 1-bit CS 
framework in the future. 
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Figure 5: Enforcing consistency with noise: One-sided i\ vs. one-sided I2 BIHT. When BIHT attempts to minimize 
a one-sided £2 instead of the one-sided t\ objective, the algorithm is more robust to flips of measurement signs. *Note 
that the Hamming error in (b) is measured with regards to the noisy measurements, e.g., a Hamming error of zero 
means that we reconstructed the signs of the noisy measurements exactly. 



significant noise that might cause a large number of sign flips, such as Gaussian noise. 

Performance with a fixed bit-budget. In some applications we are interested in reducing 
the total number of bits acquired due to storage or communication costs. Thus, given a fixed total 
number of bits, an interesting question is how well 1-bit CS performs in comparison to conventional 
CS quantization schemes and algorithms. For the sake of brevity, we give a simple comparison here 
between the 1-bit techniques and uniform quantization with Basis Pursuit DeNoising (BPDN) [8] 
reconstruction. While BPDN is not the optimal reconstruction technique for quantized measure- 
ments, it (and its variants such as the LASSO |64j ) is considered a benchmark technique for 
reconstruction from measurements with noise and furthermore, is widely used in practice. 

The experiment proceeds as follows. Given a total number of bits and a (uniform) quantization 
bit-depth B (i.e., number of bits per measurement), we choose the number of measurements as 
M = total bits/5, N = 2000, and the sparsity K = 20. The remainder of the experiment proceeds 
as described earlier (in terms of drawing matrices and signals). For bit depth greater than 1, we 
reconstruct using BPDN with an optimal choice of noise parameter and we scale the quantizer to 
such that signal can take full advantage of its dynamic range. 

The results of this experiment are depicted in Figure |6j We see a common trend in each line: 
lackluster performance until "sufficient" measurements are acquired, then a slow but steady increase 
in performance as additional measurement are added, until a performance plateau is reached. Thus, 
since lower bit-depth implies that a larger number of measurements will be used, 1-bit CS reaches 
the performance plateau earlier than in the multi-bit case (indeed, the transition point is achieved 
at a higher number of total bits as the bit-depth is increased). This enables significantly improved 
performance when the rate is severely constrained and higher bit-rates per measurements would sig- 
nificantly reduce the number of available measurements. For higher bit-rates, as expected from the 
analysis in [37], using fewer measurements with refined quantization achieves better performance. 

It is also important to note that, regardless of trend, the BIHT algorithm performs strictly better 
than BPDN with 4 bits per measurement and uniform quantization for the parameters tested here. 
This gain is consistent with similar gains observed in [30} 131] . A more thorough comparison of 
additional CS quantization techniques with 1-bit CS is a subject for future study. 

Comparison to quantized Nyquist samples. In our final experiment, we compare the 
performance of the 1-bit CS technique to the performance of a conventional uniform quantizer 
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Figure 6: Comparison of BIHT to conventional CS multibit uniform scalar quantization (multibit reconstructions 
performed using BPDN [8j). BIHT is competitive with standard CS working with multibit measurements when the 
total number of bits is severely constrained. In particular, the BIHT algorithm performs strictly better than CS with 
4 bits per measurement. 



applied to uniform Nyquist-rate samples. Specifically, in each trial we draw a new Nyquist-sampled 
signal in the same way as in our previous experiments and with fixed ./V = 2000 and K = 20; 
however, now the signals are sparse in the discrete cosine transform (DCT) domain. We consider 
four reconstruction experiments. First, we quantize the Nyquist-rate signal with a bit-depth of (3 
bits per time-domain sample (and optimal quantizer scale) and perform linear reconstruction (i.e., 
we just use the quantized samples as sample values). Second, we apply BPDN to the quantized 
Nyquist-rate samples with optimal choice of noise parameter, thus denoising the signal using a 
sparsity model. Third, we draw a new Gaussian matrix with M = N, quantize the measurements 
to /3 bits, again at optimal quantizer scale, and reconstruct using BPDN. Fourth, we draw a new 
Gaussian matrix with M = f3N and compute measurements, quantize to one bit per measurement 
by maintaining their sign, and perform reconstruction with BIHT. Note that the same total number 
of bits is used in each experiment. 

Figure [7] depicts the average SNR obtained by performing 100 of the above trials. The lin- 
ear, BPDN, Gaussian measurements with BPDN, and BIHT reconstructions are depicted by solid, 
dashed, dash-circled, and dash-dotted lines, respectively. The linear reconstruction has a slope of 
6.02dB/bit-depth, exhibiting a well-known trade-off for conventional uniform quantization. The 
BPDN reconstruction (without projections) follows the same trend, but obtains an SNR that is 
at least lOdB higher than the linear reconstruction. This is because BPDN imposes the sparse 
signal model to denoise the signal. We see about the same performance with the Gaussian projec- 
tions at M = N, although it performs slightly worse than without projections since the Gaussian 
measurements require a slightly larger quantizer range. Similarly to the results in Fig. [HJ in low 
Nyquist bit-depth regimes (J3 < 6), 1-bit CS achieves a significantly higher SNR than the other 
two techniques. When 6 < /3 < 8, 1-bit CS is competitive with the BPDN scenario. Thus, for a 
fixed number of bits, 1-bit CS is competitive to conventional sampling with uniform quantization, 
especially in low bit-depth regimes. 
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Figure 7: Comparison of uniformly quantized Nyquist-rate samples with linear reconstruction (solid) and BPDN 
denoising (dashed), CS with M = N and BPDN reconstruction (dash-circle), and 1-bit quantized CS measurements 
with BIHT reconstruction (dash-dotted). Nyquist samples were quantized with bit-depth f3 £ [2,10] and 1-bit CS 
used M — /3N measurements; the same number of bits is used in each reconstruction. The Nyquist-rate lines have 
the classical 6.02dB/bit-depth slope, as expected. For a fixed number of bits, 1-bit CS does not follow this slope and 
outperforms conventional quantization when f3 < 6. 

6 Discussion 

In this paper we have developed a rigorous mathematical foundation for 1-bit CS. Specifically, we 
have demonstrated a lower bound on reconstruction error as a function of the number of measure- 
ments and the sparsity of the signal. We have demonstrated that Gaussian random projections 
almost reach this lower bound (up to a log factor) in the noiseless case. This behavior is consis- 
tent with and extends existing results in the literature on multibit scalar quantization and 1-bit 
quantization of non-sparse signals. 

We have also introduced reconstruction robustness guarantees through the binary e-stable em- 
bedding (BeSE) property. This property can be thought of as extending the RIP to 1-bit quantized 
measurements. To our knowledge, this is the first time such a property has been introduced in 
the context of quantization. To be able to use this property we showed that random constructions 
using Gaussian pdf (or more generally using isotropic pdf in the signal space R ) generate such 
embeddings with high probability. This construction class is still very limited compared to the 
numerous random constructions known for generating RIP matrices. Extending this class with 
other constructions is an interesting topic for future research. 

Using the BeSE, we have proven that 1-bit CS systems are robust to measurement noise added 
before quantization as well as to signals that are not exactly sparse but compressible. 

We have introduced a new 1-bit CS algorithm, BIHT, that achieves better performance over 
previous algorithms in the noiseless case. This improvement is due to the enforcement of consistency 
using a one-sided linear objective, as opposed to a quadratic one. The linear objective is similar to 
the hinge loss from the machine learning literature. 

We remind the reader that the central goal of this paper has been signal acquisition with 
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quantization. As explained previously, one motivation for our work is the development of very high 
speed samplers. In this case, we are interested in building fast samplers by relaxing the requirements 
on the primary hardware burden, the quantizer. Such devices are susceptible to noise. Thus, while 
our noiseless results extend previous 1-bit quantization results (e.g., see [53] and [5l]) to the sparse 
signal model setting and are of theoretical interest, a major contribution has been the further 
development of the robust guarantees, even if they produce error rates that seem suboptimal when 
compared to the noiseless case. 



A number of interesting questions remain unanswered. As we discuss in Section 3.2 earlier, 
we have found that the BeSE holds for Gaussian matrices with angular error decay roughly on 
the order of 0(^K/M) worse than optimal. One question is whether this gap can be closed 
with an alternative derivation, or whether it is a fundamental requirement for stability. Another 
useful pursuit would be to provide a more rigorous understanding of the discrepancy between the 
performance of the one-sided i\ and £2 objectives. Analysis of the performance behavior might lead 
to better one-sided functions. 
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A Lemma [p Intersections of Orthants by Subspaces 

While there are 2 M available quantization points provided by 1-bit measurements, a X-sparse 
signal will not use all of them. To understand how effectively the quantization bits are used, 
we first investigate how the X-dimensional subspaces projected from the A^-dimensional X-sparse 
signal spaces intersect orthants in the M-dimensional measurement space, as shown in Fig [T] for 
K = 2 and M = 3. 

We use I(M, K) to denote the maximum number of orthants in M dimensions intersected by a 
X-dimensional subspaces. A bound of for I{M,K) is developed in |681 169] : 

I(M,K)<2J2( l )■ (17) 



z=o 

M-V 



For K < M/2, this simplifies to I(M,K) < 2KQz[)- 



Using ( £jj + (p = we can also derive a simple bound on (17) for K < M. We observe 

that 

K - L 'M-i\ fM\ ^Vm-i\ . {M\ ^{m\ ,^;Vm 
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Figure 8: (a) The geometry of orthants in R 3 . (b) The geometry of spherical caps. 
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While the bound in (17) is tight and holds for subspaces in a general configuration, the closed 



form simplified bound in (18) can be improved by a factor of (M — K + 1) (which asymptotically 
makes no difference in the subsequent development) using the proof we develop in the remainder 
of this appendix. In addition to the improvement, the proof also provides significant geometrical 
intuition to the problem. 

First we define two new elements in the geometry of the problem: orthant boundaries and 
their faces. Each orthant has M boundaries of dimension M — 1, defined as the subspace with a 
coordinate set to 0: 

<Bi = {x | (x)i = 0}. 

We split each boundary into 2 A/_1 faces, defined as the set 

^Fi,z = {x | (x)i = and sign (x)j = (z)j for all j 7^ 1} , 

where z is the sign vector of a bordering orthant, and i is the boundary in which the face lies. 
Each face borders two orthants. Note that the faces are (M — l)-dimensional orthants in the 
(M — l)-dimensional boundary subspace. The geometry of the problem in M 3 is summarized in 



Figure 8(a) 



Next, we upper bound I(M, K) using an inductive argument that relies on the following two 
lemmas: 

Lemma 6. If a K -dimensional subspace S C M. M is not the subset of a boundary *8j ; then the 
subspace and boundary do intersect and their intersection is a (K — 1) -dimensional subspace of^Bi. 

Proof. We count the dimensions of the relevant spaces. If S is not a subset of then it equals the 
direct sum <S = (<S D 2$i) © W, where W C M M is also not a subspace of QSj. Since dim QJj = M — 1, 



dimW < 1, and dim S D = K - 1 follows. 



□ 
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Lemma 7. For K > 1, a K -dimensional subspace that intersects an orthant also non-trivially 
intersects at least K faces bordering that orthant. 



Proof. Consider a K-subspace S, a point p G S interior to the orthant s i gnp , and a vector x\ G S 
non-parallel to p. The following iterative procedure can be used to prove the result: 

1. Starting from 0, grow a until the set p + ax\ intersects a boundary *8j, say at a = oj. It 
is straightforward to show that as a grows, a boundary will be intersected. The point of 
intersection is in the face Ti tS i gnp . The set {p + ax{\a G (0,a;)} is in the orthant C S ignp- 

2. Determine a vector xi + \ G S parallel to all the boundaries already intersected and not parallel 
to p, set 1 = 1 + 1 and iterate from step 1. 

A vector can always be found in step 2 for the first K iterations since S is K- dimensional. The 
vector is parallel to all the boundaries intersected in the previous iterations and therefore p + 
axi always intersects a boundary not intersected before. Therefore, at least K distinct faces are 
intersected. □ 



Lemmas [6] and [7] lead to the main result in this Appendix. Lemma [T] in Section 2.1 follows 
trivially. 

Lemma 8. The number of orthants intersected by a K -dimensional subspace S in an M- 
dimensional space V is upper bounded by 

I(M,K)< 2 " ( M )<2"( M 
v ' ; ~ M-K + 1\K) - \K 

Proof. The main intuition is that since the faces on each boundary are equivalent to orthants in 
the lower dimensional subspace of the boundary, the maximum number of faces intersected at each 
boundary is a problem of dimension I(M — 1,K — 1). 

If S is contained in one of the boundaries in V, the number of orthants of V intersected is 
at most I(M — 1,K). Since I(M,K) is non-decreasing in M and K, we can ignore this case in 
determining the upper bound. 

If S is not contained in one of the boundaries then Lemma [6] shows that the intersection of S 
with any boundary 5Sj is a (K — l)-dimensional subspace in 58j. To count the faces of Q3i intersected 
by S we use the observation in the definition of faces above, that each face is also an orthant of 23 j. 
Therefore, the maximum number of faces of 5S» intersected is a recursion of the same problem in 
lower dimensions, i.e., is upper bounded by I{M — 1, K — 1). Since there are M boundaries in V, 
it follows that the number of faces in V intersected by S is upper bounded by M ■ I(M — 1, K — 1). 

Using Lemma [7] we know that for an orthant to be intersected, at least K faces adjacent to it 
should be intersected. Since each face is adjacent to two orthants, the total number of orthants 
intersected cannot be greater than twice the number of faces intersected divided by K: 
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To complete the induction we use I(M , 1) = 2 for all M; for if = 1 the subspace is a line through 
the origin, which can intersect only two orthant^J This leads to: 



I(M,K) < 2 " ( M ) < 2«( M ) < 2«(^) K 
V ' >-M-K + l\K ~ \K ~ \KJ 



□ 



B Theorem [TJ Distributing Signals to Quantization Points 

To prove Theorem [T] we consider how the available quantization points optimally cover the set of 
signals Y? K = {x G l w : ||cc|| = 1, ||a;||o < if}- This set corresponds to the union of L = (^) 
if -dimensional unit spheres U«e[L] &ii each Si = {x G M. N : \\x\\ = 1, suppa; C T,} being associated 
to one support Tj C [iV] taken amongst the L available If -length supports of R . The cover should 
be optimal with respect to the worst case distance, denoted by r, of any point in Y>* K to its closest 
quantization point. Our goal is to determine a lower bound on the best-case r we can achieve. 

Unfortunately, determining the optimal cover of T,* K is not straightforward. For instance, opti- 
mally covering each Si individually does not produce an optimal cover for their union. Indeed, at 
the intersection of different if -dimensional spheres in T,* K there are if -sparse signals, with different 
support, very close to each other. A single quantization point in M. N could be close to all those 
signals, whereas independent cover of each Si would have to use a different quantization point to 
represent the signals on each sphere in that intersection. 

Thus, instead of determining the optimal cover of T>* K , we establish a lower bound on r required 
to cover a subset T,* K of T,* K using the same number of points. A cover of Ti* K with a smaller r 
would not be possible, since that would also cover Y? K with the same or smaller r. Therefore, this 
r establishes a lower bound for the cover of T,* K . To establish the bound, we pick T,* K such that the 
neighborhood around the intersection of the balls is not included. Specifically, we pick 

:= |J Si C S^, Si := {x £ Si : V k £ T u \x k \ > 2r} C Si. 

i 

In other words, we pick the subset of each if -ball, such that all the nonzero coordinates of the 
signals in the subset are greater than 2r. The union of those subsets for all possible supports 
comprises H* K . This choice ensures that any signal x G Si has distance at least 2r from any signal 
in any other Sj,j ^ i, and therefore both cannot be close to a common quantization point in WL N 
with distance r from each. Notice that Si is non empty as soon as r < l/(2y/~K). 

With this choice of E^-, an optimal covering can be obtained by merging the optimal coverings 
of each Si for i G [L]. For an optimal covering of distance r, each of the elements in Si should 
belong to some ball of radius r centered at the quantization point. Thus, each quantization point 
and its corresponding r-ball should cover as large an area of Si as possible. This is achieved when 
the quantization point is on Si, and the intersection of the ball with Si is a spherical cap of radius 
r |70| . Thus, for the spherical caps to cover Si, the total area of all spherical caps in the cover 
should be greater than the area of Si- Furthermore, since the overall cover of Y? K is composed of 

8 We recall that, from the definition H, two different orthants have an empty intersection. 
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separate cover of each Si, the total area of all spherical caps used for the cover should be greater 
than the area of Y^* K . 

Therefore, since 2 K L{^) quantization points and corresponding spherical caps are available, 
according to Lemma [T] and Corollary [TJ to cover L subsets of K-spheres, as described above, the 
cover should satisfy 



2 K L 



M 
K 



(20) 



where er(-) denotes the rotationally invariant area measure the i^-sphere Si and k(t) denotes the 
surface of a spherical cap of radius r. 



To determine the smallest r satisfying (20), we thus need to measure the set Y,* K . Choosing one 
i G [L] we have ^"(S^) = La(Si) since the sets (k G [L]) are disjoint with identical area. We 
first show that 



a(Si)>(l-2rVK) K a(S K - L ), 



(21) 







where S 1 is a X-dimensional sphere in R . Note that this bound is tight at two ends: at r 
where Si = Si (almost everywhere) and at r = l/(2y/~K) for which Si = 0. 

To prove (21 ) we assume without loss of generality T« = {1, • • • , K} and consider Si = S K ~ l = 
{x G : \\x\\ = 1} and Si = {x G H K : \\x\\ = 1, min|xfc| > 2r}. We define the intersection 
S^ := Si Pi V with the positive orthant V = {x G 1 K : Vi G [-K"],^ > 0}. By symmetry, 
cr(Si) = 2 K a{Sf) since there are 2 K orthants in M. K . As described in [70], the ratio between the 
measure of Sf and the one of the full sphere S K ~ 1 equals the ratio between the volume occupied 



by a cone formed by S?~ in the unit ball B C M K and the volume of this ball, i.e., 



pK 



(22) 



where /i is the Lebesgue measure in M. K and C{A) = {ta : t G [0, 1], a G A} C B K is the portion 
of the cone (with apex on the origin and restricted to B ) formed by the subset A C S K ~ l . The 
geometry of our problem is made clear in Figure [9) 



e 2 



St =SiHV 





V + 2rl 



= (V + 2?-l) n B K 



Figure 9: The geometry of our problem in K 2 . We only need to consider one orthant, V, as the problem is the same 
for all orthants. To measure the surface area of Sf relative to the surface of >S X_1 , we consider the ratio of the 
volume of C(S^) with respect to this of B K . We lower bound the former using C r (S^~) C C(S^). 
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To measure the volume of C(S^~), writing 1 = (1, • • • , 1) T E W K , we first define the set 

C r (S+) := (V + 2rl)nB K , 

also shown in the figure, which is non-empty for 2r < 1/yK. Since C r {Sf) n Si = and 
2rl € C(S^), it is straightforward to show that C r (Sf) C C{Sf), and therefore the measure of 
C r (Sf) is a lower bound to the measure of C{Sf). Furthermore, by the translation invariance of /i, 

fJL(C r (§t)) = ti(P + 2rl ) n BK ) = ti'P n (B K - 2rl)), 

where B K — 2rl is the unit ball centered on — 2rl. Setting a = (1 — 2r^/~K) it follows that 
||w + 27-1 1| < 1 for any u £ R K , ||tt|| < a. Consequently, aB K C 5^ - 2rl and V n aB x C 
P n (5 X - 2rl). Therefore, the measure of the positive orthant of a iT-ball with radius a lower 
bounds the measure of C(S^). 

Putting everything together we obtain 

tiC(St)) > ^{Vr\aB K ) = a K 2- K ^{B K ), 

which, using jgj), implies that a{Sf) / 'a(S K - x ) > a~ K 2~ K and <r(E^-) = La(Si) > (1 - 
2ry/K) K La{S^). 

In turn, p0| ) becomes 

2^^).(r)>(l-2rv / K)^a(^- 1 ), 

where «(r) < r^S*" 1 ) [70]. From (^) < (eM/K) K , the result follows: 

2V(ff > (1 - 2rVK) K => r > + 2v / tf) _1 = K/(2eM + 2if 3 / 2 ). 



C Theorem [2^ Optimal Performance via Gaussian Projections 

To prove Theorem [2j we follow the procedure given in |48} Theorem 3.3]. We begin by restricting 
our analysis to the support set T C [N] := {1, • • • , N} with \T\ < D < N, and thus we consider 
vectors that lie on the (sub) sphere S*(T) = {x : suppcc C T, \\x\\2 = 1} C K . We remind the 
reader that B r (x) := {a € S 1 ^" 1 : \\x — a\\ 2 < r} is the ball of unit norm vectors of Euclidean 



distance r > around x, and we write B*{x) = B r (x) n E*(T) as in Section 3.2 



Let us fix a radius 5 > to be precised later. The sphere S*(T) can be covered with a finite 
set Qs C S*(T) of no more than (3/5) 15 points such that, for any w G E*(T), there exists & q £ Qs 
with [16]. 



Using the notation ds defined in Sec. 3.1 given a vector <p ~ J\f Nxl (0, 1) and two distinct points 
p and q in Qs, we have that 

P[Vu€S|(p),V«€Bj(g) : sign v? T u/ sign ^ T «] > d s (p,q) - ^/flH, 

from Lemma [9] (given in Appendix [D]) . Since for all it G B$(p) and i> G Bg{o) 

nds(p,q) > 2sin(f d s {p,q)) = \\p-qh > \\u - v\\ 2 - 25, 
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we can write for any e D > 

P[Vm G B* s (p),Vv G B* s (q) : sign cp T u ^ sign cp T v \ \\u - v\\ 2 > e Q ] > f - (f + y^D) 5. 
By setting <5 = 7re / (4 + 7t\/2ttD) (and reversing the inequality), we obtain 

P [3 u G 5|(p),3t7 G 51(g) : sign (ip T u) = sign {<p T v) \ \\u - v\\ 2 > e a ] < 1 



2 4 



Thus, for M different random vectors ipi arranged in $ = (ipi, • • • , fu) T ~ Af MxN (0, 1), and for 
the associated mapping A defined in we get 

P[3tt€flJ(p) J 3«€B|( q ):A(tt) = A(«) | ||u - v|| 2 > ej < (1 - f ) M . 

In other words, we have found a bound on the probability that two vectors' measurements are 
consistent, even if their Euclidean distance is greater than e , but only for vectors in the restricted 
(sub) sphere £*(T). Now we seek to cover the rest of the space T,* K (unit norm if-sparse signals). 

Since there are no more than ( 3 ') < (IQ5I) 2 < (3/5) 2£) pairs of distinct points in Q$, we find 



'[3u,t) G E*(T) : A(u) = A(v) \ \\u - v\\ 2 > e ] < (^(12 + 3ttV2ttD)) 2D (1 



e a \M 



To obtain the final bound, we observe that any pair of unit if-sparse vectors x and s in 
Yf K belongs to some £*(T) with T = suppa; U supps and \T\ < 2K. There are no more than 
{ 2K ) < (eN/2K) 2K of such sets T, and thus setting D = 2K above yields 

F[3u,v£Z* K : A(u) = A(v) \ \\u - v\\ 2 > e ] 

< ( e 4f K (^(12 + ^^K)) AK (1 - f ) M 

< exp [2Klog(ff ) + 4^1og(^(12 + Gvrv 7 ^)) - Mf ] , 

where the second inequality follows from 1 — < exp % . By upper bounding this probability by 
rj and solving for M, we obtain 

M > I (2K log §f + 4tflog(^(12 + evrVvri?)) + log ±). 

Since if > 1, we have that ^(12 + Gvr-v/vrif) < 17v/^, and thus the previous relation is then 
satisfied when 



M > J(2if log §f + 4Klog(^f^) + logi) 
= f (2if logiV + 4Klog(^) + logi). 



D Lemma [3t Concentration of Measure for £-Balls 

Since T C [iV] is fixed with size \T\ = D, proving Lemma amounts to showing that, for any fixed 
e > and < 5 < 1, given a Gaussian matrix $ G IR Mx , the mapping A : M. D — > B M defined as 
A(u) = sign (&u), and for any fixed x, s G 5'' D ~ 1 , we have 



Vu G Bg(x), \/v G -B|(s), d H (A{u),A{v)) - d s {x,s) 



< € + ^/lD5) > l-2e 



-2e 2 M 
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where, in this case, Bg(p) = (B s (p) n S ^ 1 ) C R D for any p G R D . 

Given u' G i3|(as) and t/ G the quantity Md H (A(u'),A(v')) is the sum J2iAi(u') © 

where Aj(u') stands for the i th component of A(u'). For one index 1 < i < M 

Ai{u') ® Ai(v') < Z+ := max {Ai(u) ® Ai(v) : u G B* s {x),v G B* 5 {s)} , 
Ai{u') ® Ai(v') > Z~ := mm {Ai(u)®Ai(v) : u G Bg(x),v G -B|(s) }, 

and therefore 

M A/ 

z- :=E z r < AfdH(A(t»'),A(«')) < = : z+ - 

t=i i=i 

Of course, the occurrence of Z+ = (Zr = 1) means that all vector pairs taken separately in 
Bg(x) and Bg(s) have consistent (or respectively, inconsistent) measurements on the i th sensing 
component A%. More precisely, since <~p% ~ M Nxl (0, 1), Zf 1 are binary random variables such that 
F\Zf = 1] = 1 — po and P[Zj r = 1] = p\ independently of i, where the probabilities po and p\ are 
defined by 

p (x,s,6) = F[Z+ = 0] = F[Vu G B* s (x),Vv G B* s (s), Ai(u) = A^v)], 
Pi(x,s,5) = P[Vu€Bj(aj),V«€Bj(a), ^ (u) ^ 4 («) ] • 

In summary, Z + and Z~ are binomially distributed with M trials and probability of success 
1 — po and pi, respectively. Furthermore, we have that EZ + = M (1 - p ) and EZ~ = Mp 1: thus 
by the Chernoff-Hoeffding inequality, 



,-2Me 2 
-2Me 2 



F[z + > M (l- Po ) + Me] < e 
F[Z~ < Mpi - Me] < e~ 

This indicates that with a probability higher than 1 — 2e~ 2Me2 , we have 

Pi-e < d H (A(u'),A{v')) < (l- Po )+e. 

The final result follows by lower bounding po and p\ as in Lemma [9j 

Lemma 9. Given < 5 < 1 and two unit vectors x, s G S 0-1 , we have 

Po = F[VuEB* s (x), Vv£B* s (s), sign (^,«) = sign (<p,v)] > 1 -d s (x,s) - ^/fotf, (23) 
pi = F[\/u G ££0*0, Vw G B* s (s), sign (<p,w) / sign {<p,v}] > d s (x,s) - sJ\D 5. (24) 

Proof of Lemma^ We begin by introducing some useful properties of Gaussian vector distribution. 
If ip ~ A/' Z)xl (0, 1), the probability that <p G A C M 15 is simply the measure // of A with respect to 
the standard Gaussian density j((p) = ^ 2 J) D / 2 e ~" v " > i- e -) 



[^G^l]=M^)= / d D <p 7 (<p) 

J A 



with n(EP) = 1. It may be easier to perform this integration over a hyper-spherical set of coor- 
dinates measured in a basis defined by the vectors x and s. This is possible since the pdf 7 is 
rotationally invariant. 
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Specifically, we consider the canonical basis £ = {ei, ••• ,erj} of M. D where, by using the 
cross product A in M. D , ei := (x A s) / \\x A s\\2, := x and en-i '■= e-D A ei, while the 
other vectors {e^ : 2 < k < D — 2} are defined arbitrarily for completing the basis. In this 
system, the "xs" plane is equivalent to the plane spanned by and eo-i- Moreover, any vector 
ip G M D can be represented by the spherical coordinates (r,<j>i, ■ ■ ■ ,4>d~i) where r = \\<p\\2 G K+, 
(01, • • • , 4>d-2) S [0, corresponds to the vector angles in each dimension, and 0d_i G [0, 27r] 
being the angle formed by the projection of <p in the "as" plane with x = e/> 

The change of coordinates between the Cartesian and the spherical representations of cp in £ is 
then defined as ipi = rcos0i, (f2 = r sin 0i cos 1^2, ^£>-i = ?" sin0i ••• sin 4>d-2 cos 0d-i ; and 
<^£> = r sin 0i • • • sin 0£>_2 sin 0£)_i, while, conversely, r = ||<^||2, tan0i = (ip 2 D + • • • + y^) /^li 
tan0D_ 2 = [}Pd + (Pd-i) 1/2 / and tan0 D -i = ^/^D^ij^] 

We now seek a lower bound on p\ . Computing this probability amounts to estimating 

pi = P[Vu G B* 5 (x), Vv£B* s (s), (<p,u)(<p,v)<0] = fx(W s ), 

where Ws ■= {<£> : (<P-, u )(f, v ) < 0, Vw G B^(x), Vi> G i?|(s)} is the set of all vectors such that 
its inner product with u and v result in different signs. 

Note that if B* s (x) n B* s (s) / 0, then Pl = since pi < P[Vu G B* s (x) n S|(s), (^,u) 2 = 
0] = 0. This non-empty intersection is avoided when ds(x,s) > ^ arcsin5/2. Furthermore, since 
arcsin A < for any < A < 1, this occurs if ds(x, s) > 5. 

The remainder of the proof is devoted to finding an appropriate way to integrate the set W5. 
To this end, we begin by demonstrating that estimating p\ can be simplified with the following 
equivalence (proved just after the completion of the proof of Lemma [9]). 

Lemma 10. The set Ws C M. D is equal to the set 

V$ = W <¥>,*><¥>>*> < 0, \\x - V u{ip) x\\ > 6, \\s - V n{v) s\\ > 5}, 
where 'Pn^) * s ^e orthogonal projection on the plane n(y?) = {u G ~Sl D : (ip, u) = 0}. 

Using the hyper spherical coordinate system developed earlier and denoting the angle tt ds(x, s) 
by 9, membership in VJ can be expressed as 

!tan0 D _i G [O,tan0], (Rl) 
sin0i • • • sin0 D _ 2 I sin^D-il > 5, (R2) 
sin0i • • • sin0£)_2 I sin(0£)_i — 6*)| > 5. (R3) 



Indeed, requirement (Rl) enforces (ip,x)(<p, s) < 0, while (R2) and (R3) are direct translations 
of the requirements that \\x — Tn^)^ = \{<Pi x = e D)\ > 8 and ||s — "Pn^) s ll = \{<P> s = 
— sinOei) + cos9 erj-i}\ > 5, with <p = jy^jjV- 

9 This change of coordinates can be very convenient. For instance, the proof of Lemma [2] relies on the computation 
PL4i(a;) 7^ Ai(s)] = )i(A = {<p : 4>d-i £ [0, n ds(x, s)] U [tt, tt + irds(x, s)]}) = ds{x, s), since for (almost) all ip G A, 
x and s live in the two different subvolumes determined by the plane {u : {ip, u) = 0} |57U58] . 
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We are now ready to integrate to find p\ : 
l> S -/<(V,-i - , ^ I drr D - l e- r2 ' 2 



(2tt) d / 2 



ii sin D " 2 0! 



d(/>D_ 2 sin0 D _2 



[ / d0 D -l Xo(5, v )(0D-l) Xo(5,<p)(0D-l - 0)1 , 



with x\{4>) = 1 if | sin (f>\> \ and else, for some A S [0, 1], and g(5, cp) = 6/ (sin 4>\ ■ ■ ■ sin 4>b-2)- 
However, 

/ d<p X\(<j>) X\(<j> ~ 0) = max(20-4arcsinA, 0), 

and max(26> — 4arcsinA,0) > 29 — 2ir\, since A < arcsinA < |A for any A G [0, 1]. Consequently, 



M(V7) > 



(2tt) d / 2 



dr r^^-^ 2 



ii sin D ~ 2 0! . • • ■ 



!>D_2 Sin0£)_2 



20 



2?r5 

(sin 0i ••• sin<f> D _ 2 ) 



2^5 



Id-4 ■■■Iq TT 5 



7T 



(2Io) ^1 • • • -Td-2 71" Id-2 ' 



with I n := £ d(f> sin n <f> and knowing that (2vr) D / 2 = (2/ ) (h ■ ■ ■ I D - 2 ) j R+ dr r D-i & ^/2 gince 

p(R D ) = 1 = (2tt)- d / 2 ( r dfo-i) (If- Id-2) [ dr r D " 1 e-'- 2 / 2 . 

JO JR+ 



Using the fact that I n = ^T{^) /T{l + l) > v^/vf + T> we obtain ^-2 > -%£=s ^ v¥' 



and thus 



Pi > d s (x,s) - yJlDS. 



If we want a meaningful bound for p\ > 0, then we must have ds(x, s) > J\D S > 5. Therefore, 
as soon as the lower bound is positive, the aforementioned condition dg(x, s) > 5 always holds. 

The lower bound for pq is obtained similarly. It is straightforward to show that po = ^(Vf), 
with V/ = {cp : (cp, x)(cp, s) > 0, \\x - Vn(<p) x \\ > <*> \\y ~ ^n( v ) s \\ > Lower bounding /x(V/) 
as for fJ,(Vg~), the only difference occurring with the integral on 4>d-2 given by 



,tt] U [7r+6»,27r] 



^£»-l Xg{S,if>){^D-l) Xg(5,<p)(4>D-l 



2ir -29 - Aarcsmg(5,cp) > 2(tt - 0) - 2irg(5, cp). 



Therefore, the lower bound of po amounts to change 9 — > ir — 9 in the one of p\, which provides the 
result. □ 



Proof of Lemma 10. If 5 = 0, there is nothing to prove. Therefore 5 > and if cp* belongs to either 
Vs or Ws, we must have (cp, x)(cp, s) < 0. It is also sufficient to work on the restriction of V$ and 
W$ to unit vectors. 
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(i) Vg C Wg: By contradiction, let us assume that p>* G Vg but tp* ^ Wg. Without any loss of 
generality, (<p*,x) > and {<p*,s) < 0. Since tp* ^ Wg, there exist two vectors u* G B* s (x) and 
v* G Bg(s) such that {<p* ,u*)(ip* ,v*) > 0. If (<p*,u*) > and (<p*,v*) > 0, then, since (<p*,s) < 
and by continuity of the inner product, there exist a A € (0,1) such that ((p*,s(X)) = with 
s(A) = s + A(i>* — s). Therefore, s(A) G n(y) and, by definition of the orthogonal projection, 
II s ~~ ^n(¥>) S H — II s _ < A5 < (5 which is a contradiction. If (ip*,u*) < and (<p*,v*) < 0, we 

apply the same reasoning on x and ia*. Therefore, Vg C W5. 

(Mj Wg C V^: If G W<5 with ^ Vs, we have either [| as - Vn(<p*) x\\ < 8 or ||s - P n ( v *) «|| < $■ 
Let us say that \\x — Vm v *\ x\\ < 5. Then, for w = x + 6 (Vu( v *) x — £c)/||7 3 n( ¥ j*) x ~ x \\ G Bg(x), 
{cp*,x){cp*,w) = ({<p*,x)) 2 (l-5/\\V Il{ ^ ) x-x\\)+6{<p*,V u{tp » ] x} < 0. However, tp* G W 5 and 
{(p* , x) (cp* , s) < 0, leading to ((p* , w) {<p* , s) > 0, which is a contradiction. □ 



E Theorem [3]: Gaussian Matrices Provide BeSEs 

The strategy for proving Theorem [3] will be to count the number of pairs of -fC-sparse signals 
that are Euclidean distance 5 apart. We will then apply the concentration results of Lemma [3] 
to demonstrate that the angles between these pairs are approximately preserved. We specifically 
proceed by focusing on a single K- dimensional subspace (intersected with the unit sphere) and then 
by applying a union bound to account for all possible subspaces. 

Let T C [N] be an index set of size \T\ = K, £*(T) = {w £ R N : suppio C T, ||w|| 2 = 1} be 
the sphere of unit vectors with support T. We first use again the fact that the sphere £*(T) can 
be 5-covered by a finite set of points Qx,g- That is, for any w G S*(T), there exists a q G Qt,s 
such that w G B* s (q) = Bg(q) n^ = {w' G Sy : \\w' - q\\ 2 < 5} [IB]. Note that the size of Q T ,g 
is bounded by \Q T>S \ < Cg = (3/5) K . 

Let $t be the matrix formed by the columns of $ indexed by T and note that &t w = &w. 
Given e' > 0, for all pairs of points p,q G Qts, we have 



Vu G Bg(p), Vu G Bg(q), d H (A(u), A(v)) - d s (p,q) < e' + J\K 5 



> 1 _ 2 (3 ) 2«r e - 2e «M > (25) 



This follows from Lemma [3] with D = K, since <3?t is a Gaussian matrix and by invoking the union 
bound, since there are ( „*) < Cf = (3/<5) 2 ^ such pairs x, s. 



The bound ( 25 ) can be extended to all possible index sets T of size K via the union bound. 



Specifically, for all T C [iV] and all pairs of points p,q G Qt,s, we have now jointly 
F(yueB* s (p),VveB* s (q), \d H (A(u),A(v)) - d s (p,q)\ < e + 5 



> 1 _ 2( e J) K (l) 2K e- 2( ~ M (26) 

since there are no more than g) < (eN/K) K possible T. 

We can reformulate this last result as follows. Let us take any pair of points on the sphere 
x, s G S' Ar_1 such that their joint support T = supp (x) U supp (s) has a size \T\ < K. We 
have obviously x,s G £*(T). Taking the covering set Qt,5 defined for £*(T), there exist two 
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points p,q £ Qt,s such that x £ Bg(p) and s £ B* & {q). From (26), with a probability exceeding 
1 _ 2 (3)Mf e -2 e ' 2 A^ we have 



d H (A(x),A(s)) - d s (p,q) < e' + ^K5. 



(27) 



To obtain our final bound, consider that x G B$(p) implies that Trds(x,p) < 2arcsin<5/2 < ir5/2, 
and ds(s, q) can be similarly bounded. Thus, ds(x, s) > d$(p, q) — S and ds{x, s) < ds(p, q) + 5, 



and (27) becomes 



d H (A(x),A(s)) - d s (x,s) < e ' + (l + ^K)5. (28) 

Let us define the probability of failure as 2 (^if) K (^) 2K e~ 2e ' 2M = rj, where < r\ < 1, and set 
e' = (1 + a/I^) 5 and 2e' = e. Solving for M, we finally get that \d H (A(x), A(s)) - d s (x, s)\<e 
with a probability bigger than 1 — 77 if 

M > ${K log(^) + 2K \og C {1+ ^ ) ) + log(j)). 



Since if > 1, we have that 2(1 + y/2irK) < 2(1 + y/2Tr)y/K < ?,b\/K/y/9e, and thus the previous 
relation is satisfied if 

M > ^(Klog(^) + 2Klog( 35 ^^) + log(|)), 
[K \og{N) + 2K log(f) + log(|)). 
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F Lemma Stability with Measurement Noise 



I), 



In Lemma [ZJ since ~ Af MxN (0, 1), each j/j = ($£c)i follows a Gaussian distribution A/"(0, ||x 
and furthermore, since we have independent additive noise, z% = y% + rii = (&x)i + ni follows the 
Gaussian distriubtion JV(0, \x\% + & 2 )- 

We begin by bounding the probability that any noisy measurement z% has a different sign than 
the original corresponding measurement yj, i.e., we bound p := < 0). This quantity is 

interesting since M djj (A n (x), A(x)j follows a Binomial distribution with M trials and probability 

of success p and thus we also have K(d,H^A n (x), A(x)^~j = p. 
To solve for the bound, we compute 



p= duF(ziyi <0\yi = u) f yi (u) = / du F(u 2 + um < 0) g(u; \\x\\ 2 ), 

Jr Jr 

with the pdf f Vi (t) = g(t; a') = exp(-t 2 /2a' 2 ). This leads to 

rod r0 

p = I du P(rij < —u) g(u; || £c|| 2) + / du P(rtj > — u) g(u; \\xW2) 



j-oo 

OO roc _ u 2 

du 2 Q(u/ a) g(u; \\xW2) < due 2^ g(u; \\xW2) 
Jo 



1 



2-7T 1 1 EC 1 1 2 



du e 



(7 
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where Q(u) = dtg(t; 1) denotes the tail integral of the standard Gaussian distribution which is 
bounded by Q(t) < ^e"* 2 / 2 for t > (see for instance [71, E Q- (13.48)]). 

Thus, we have p < e(a, \\xW2) = \ , " 2 =^ and, by applying the Chernoff-Hoeffding inequality 

to the distribution of du [A n (x), A(x)\ 

F[Md H (A n (x),A(x)) > Me{a,\\x\\ 2 )+Me} 

< P[Md H (An(x),A(x)) > Mp + Me] 

< e- 2KU \ 

which proves the lemma. 



G Asymptotic bound on e in Theorems [2] and [3] 

Both Theorem [2] and Theorem [3] provide guarantees on the worst-case error e of the form 

e n < ^(ainog(A0 + /31og(i) + 7 inog(i)), (29) 

for some exponent n E {1, 2}, and for given constants a,/?, 7 > 0. 

In this appendix we show that, considering < rj < 1 fixed, the relation (29) implies 

e " = C>(§log(^)) (30) 

asymptotically in M/K and N . Notice that, up to a redefinition e n — > e and 7/n —> 7, it is sufficient 
to prove the relation for n = 1. We also define p := f3log(l/r/). 

First, we consider iV fixed and show e = 0(^j log(^-)J. Let us assume this is not the case, i.e., 
for all c > 0, and all R > 0, there exists a ratio M/K > R Q such that e > c (K/M) \og(M/K). 
Therefore, 

log \ < log f - log (clog f ) < log f - log(clogtfo), 

Thus (29) becomes 

e < i (a K log N + p + 7 ETlog f - 7 K log(c log J?2q)). 

Using e > c (K/M) log(AI/K), this last inequality becomes 

alog^+^p + 7 logf > clogf +7log( C lo gj R ). (31) 

For fixed N and r/, and since we reasonably have K > 1, the parameters c and i?o can always 
be selected so that 7 log(clog i?o) > a log N + p/if. In this case, (31) implies 7 log ^ > c log 
Taking c > 7, which is still compatible with the selection of Rq and c above, leads to a contradiction. 
Thus e = O (§ log ¥ ) for fixed JV. 



M 6 ii' , 

Next, we assume varies and R := M/K is fixed, and show that e 11 = 0((\/R) log(RN)). We 
again restrict the analysis to n = 1. Now we assume for all iVo > and all c > 0, there is a N > iV"o 
such that e > (c/R) log(RN). This means that log - < -log((c/R)log(RN)) and pi} implies 

i log(RN) + I log(| \og(RN)) < I log N + p. 

Since a and /? are fixed, selecting c > max ( a, Rp/ log R) and iVo such that log(RNo) > R/C leads 
to a contradiction and completes the proof. 
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